Topic modeling in software engineering research

نویسندگان

چکیده

Abstract Topic modeling using models such as Latent Dirichlet Allocation (LDA) is a text mining technique to extract human-readable semantic “topics” (i.e., word clusters) from corpus of textual documents. In software engineering, topic has been used analyze data in empirical studies (e.g., find out what developers talk about online), but also build new techniques support engineering tasks source code comprehension). needs be applied carefully depending on the type analyzed and parameters). Our study aims at describing how research with focus four aspects: (1) which have applied, (2) inputs for modeling, (3) was “prepared” pre-processed) (4) generated topics were named give them human-understandable meaning. We 111 papers ten highly-ranked venues (five journals five conferences) published between 2009 2020. found that LDA LDA-based are most frequent techniques, developer communication bug reports modelled most, pre-processing parameters vary quite bit often vaguely reported, manual naming (such deducting names based words topic) common.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Representativeness in Software Engineering Research

One of the goals of software engineering research is to achieve generality: Are the phenomena found in a few projects reflective of what goes on in others? Will a technique benefit more than just the projects it is evaluated on? The discipline of our community has gained rigor over the past twenty years and is now attempting to achieve generality through evaluation and study of an increasing nu...

متن کامل

What is Wrong with Topic Modeling? (and How to Fix it Using Search-based Software Engineering)

Context: Topic modeling finds human-readable structures in unstructured textual data. A widely used topic modeler is Latent Dirichlet allocation. When run on different datasets, LDA suffers from “order effects” i.e. different topics are generated if the order of training data is shuffled. Such order effects introduce a systematic error for any study. This error can relate to misleading results;...

متن کامل

Modeling Software Engineering Environment Capabilities

There is considerable interest today in designing open systems that permit tools to be moved freely among various environments on different hardware platforms. To develop such systems, terms such as open systems, features for open systems such as interoperability, and integration must all be precisely defined. We present a model that is an extension of a servicebased reference model for develop...

متن کامل

Humanities Research Recommendations via Collaborative Topic Modeling

We present two novel applications of collaborative topic modeling to the broad datasets of humanities research article recommendations. In the first, we present an adaptation of the semisupervised collaborative topic regression model to a situation in which no user feedback by simulating users to develop a much better contentbased recommendation model (over 95% precision and relevant recall) th...

متن کامل

Research in Software Engineering: Paradigms and Methods

Software Engineering (SE) is a field without too much historic background. The youth of the SE discipline is resulted in an immaturity of this research field and SE research still lacks suitable scientific precision. Moreover, in SE research there are several objects of study with different nature each of them and, for this reason, different research and validation methods are needed. In view o...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Empirical Software Engineering

سال: 2021

ISSN: ['1382-3256', '1573-7616']

DOI: https://doi.org/10.1007/s10664-021-10026-0